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INTRODUCTION 

This  paper  describes  a  research  and  development  effort  to  automate  and  improve  data  collection  associated  with 
USAF  occupational  surveys.  Specifically,  the  effort  involves:  (a)  research  and  development  of  a  PC-based 
procedure  for  self-administration  of  occupational  surveys  as  a  replacement  for  the  existing  paper-and-pencil  process, 
(b)  research  and  development  of  automated  scaling  procedures  of  optimal  validity  and  reliability  for  obtaining 
measures  of  tune  spent  on  job-related  tasks,  (c)  incorporation  of  feedback  and  branching  techniques  into  the 
automated  survey  technology  that  will  permit  administration  of  large  and  complex  occupational  surveys,  and  (d) 
development  of  implementation  guidelines  for  use  in  base-level  computer  systems  and  AF-wide  electronic  data 
transmission  networks. 

The  automated  survey  technology  promises  to  provide  higher  quality  data  more  rapidly  for  addressing  urgent 
manpower,  personnel,  and  training  needs.  Current  methods  of  obtaining  and  processing  data  for  occupational 
analysis  are  slow,  complicated,  and  expensive,  each  step  involving  potential  problems  that  can  decrease  sample  size, 
lengthen  projects,  or  introduce  possible  errors  into  a  database  used  for  Air  Force  decision  making.  This  research 
should  result  in  improved  occupational  data  that  will  enhance  management  of  Air  Force  specialties,  estimates  of 
training  requirements,  determination  of  the  content  of  training  courses,  promotion  selection,  and  job  structuring. 
The  accuracy  of  this  information  is  becoming  more  crucial  with  the  projected  downsizing  of  the  Air  Force  and  the 
broadening  range  of  responsibilities  associated  with  various  jobs. 

A  laboratory  test  of  the  software  involving  572  randomly  sampled  subjects  from  67  Air  Force  specialties  (AFSs)  has 
been  completed.  The  steps  in  the  computer-administered  survey  process  are  documented  in  Albert  et  al.  (1993).  The 
test  was  conducted  at  die  Armstrong  Laboratory’s  Experimental  Testing  Facility  at  Lackland  AFB,  TX,  This  facility 
provided  a  high  degree  of  experimental  control.  Trained  proctors  administered  the  survey  on  forty  identical  PCs. 
The  sample  was  selected  such  that  higher  and  lower  ability  airmen  were  adequately  sampled  from  technical  and 
nontechnical  specialties.  For  this  study,  lower  aptitude  was  defined  as  having  an  Armed  Forces  Qualification  Test 
score  of  49  or  less,  and  higher  aptitude  was  defined  as  having  a  score  greater  than  49.  This  score  separated  the 
lowest  quartile  of  airmen  aptitudes  from  the  upper  three  quartiles.  Technical  AFSs  were  defined  as  those  having  an 
Electronic  or  Mechanical  score  cutoff  and  nontechnical  AFSs  were  defined  as  those  having  an  Administrative  score 
cutoff.  AFSs  having  a  General  score  cutoff  were  classified  as  technical  or  nontechnical  depending  on  the  degree  to 
which  the  duties  and  tasks  were  technically  or  nontechnically  oriented.  To  measure  reliability  of  time  spent 
estimation  using  an  absolute  time  scale  and  four  experimental  scales,  each  job  incumbent  was  administered  the 
survey  twice  approximately  two  weeks  apart.  The  remainder  of  this  paper  will  discuss  the  criterion  and  experimental 
scales,  results  of  the  laboratory  test,  and  future  research. 
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EXPERIMENTAL  SCALES 


Research  was  conducted  to  determine  the  accuracy  of  four  types  of  scales  for  obtaining  job  incumbent  estimates  of 
time  spent  on  each  task  pta  farmed.  The  four  scales  were  a  Three-Stage  Relative  Time  Spent  Scale,  a  Direct 
Magnitude  Estimation  Scale,  an  Indirect  Magnitude  Estimation  Scale,  and  an  End- Anchored  Graphical  Scale.  Time 
spent  estimates  from  these  four  scales  were  compared  wiUi  the  criterion  values,  which  were  absolute  rime  spent 
estimates  based  on  a  cross-product  of  two  component  measures:  an  estimate  of  absolute  frequency  of  task 
performance  and  an  estimate  of  the  absolute  amount  of  time  normally  required  to  perform  the  task  once. 

The  first  stage  of  the  Three-Stage  Scale  is  the  same  as  the  currently  used  nine-point,  relative  time  spent  scale.  Each 
respondent  was  asked  to  estimate  the  relative  amount  of  time  he/she  spends  performing  each  task  on  a  nine-point 
scale  ranging  from  "very  small  amount  of  time"  to  "very  large  amount  of  time"  compared  to  all  other  tasks  he/she 
performs.  At  the  second  stage,  the  respondents  were  provided  feedback  in  terms  of  groups  of  tasks  to  which  they 
gave  the  same  rating,  so  that  they  might  refine  the  task  ratings  by  using  the  groups  of  similarly  rated  tasks  as 
contextual  reference  points  for  locating  misrated  tasks  and  moving  them  to  task  groups  with  more  compatible  time 
spent  ratings.  At  the  third  stage,  the  refined  groups  of  tasks  were  fed  back  to  the  respondent  for  further  subdivision 
of  each  group  of  tasks  into  two  or  three  more  homogeneous  subgroups,  so  as  to  yield  up  to  9  X  3  =  27  rating 
categories  containing  one  or  more  tasks.  The  absolute  time  spent  values  for  several  high  and  low  time  consuming 
tasks  were  used  to  rescale  the  relative  time  spent  values  to  absolute  time  spent  values  at  each  stage  of  the  Three- 
Stage  Scale. 

The  Direct  Magnitude  Estimation  Scale  required  a  mid-range  task  as  an  anchor  point  against  which  all  other  tasks  to 
be  scaled  were  compared  by  numerically  estimating  their  time  spent  values  as  ratios  of  the  anchor  task's  time-spent 
value.  Several  mid-range  tasks  from  the  absolute  time  set  were  used  to  rescale  ratio  estimates  to  absolute  time. 

For  the  first  stage  of  the  Indirect  Magnitude  Estimation  Scale,  the  respondents  used  verbal  anchors  whose  numerical 
values  were  previously  derived  by  Direct  Magnitude  Estimation.  At  the  second  stage  of  this  scale,  the  respondents 
located  and  moved  misrated  tasks  as  they  did  for  the  second  stage  of  the  Three-Stage  Scale.  Rescaling  of  estimates 
to  absolute  time  for  this  scale  was  the  same  as  for  the  Direct  Magnitude  Estimation  Scale. 

For  the  End-Anchored  Graphical  Scale,  which  required  anchor  tasks  at  both  ends  of  the  scale,  the  respondents  rated 
each  task  by  indicating  its  time  spent  value  as  a  point  on  a  horizontal  line  joining  the  two  anchor  tasks.  The  anchor 
tasks  were  the  tasks  having  the  highest  and  lowest  absolute  time  spent  estimates.  Several  additional  tasks  from  the 
absolute  time  speritset  were  used  to  rescale  this  scale's  ratings  to  absolute  time  spent  values. 

Subjects  within  each  aptitude  group  and  technical  AFS  vs.  nontechnical  AFS  classification  were  randomly  assigned 
to  each  experimental  scale.  For  each  scale,  the  job  incumbent  provided  a  total  time  spent  rating  on  each  task,  each 
time  spent  rating  was  transformed  to  an  absolute  time  spent  estimate,  and  the  job  incumbent  reviewed  and  made 
revisions  to  the  absolute  time  spent  estimates  ordered  high  to  low  on  time  spent.  The  accuracy  of  each  scale  (and 
each  stage  of  the  Three-Stage  and  Indirect  Magnitude  Scales)  was  determined  by  comparing  the  rescaled  time  spent 
estimates  of  the  experimental  scale  for  each  job  incumbent  with  his/her  edited  absolute  time  spent  estimates.  For  the 
Three-Stage  Scale,  equal-interval  and  ratio-interval  estimates  of  absolute  time  spent  were  computed  and  presented 
for  evaluation  as  two  separate  vectors. 


RELIABILITY  ANALYSES  OF  SCALES 
CRITERION  SCALE  RELIABILITY 

In  order  to  produce  a  criterion  value  that  would  be  comparable  across  scales,  all  reliability  analyses  were  conducted, 
not  at  the  task  level,  but  at  the  case  level,  with  the  criterion  for  each  case  being  the  Fisher  Z  value  corresponding  to 
the  correlation  of  each  case's  absolute  time  spent  ratings  across  the  two  administrations.  Fisher  Z  values  were 

averaged  for  all  cases  in  each  scale  type  and  reconverted  to  an  average  correlation  (  r ).  Table  1  shows,  for  each 
experimental  scale  (treatment)  and  across  all  treatments,  the  number  of  respondents  (N),  the  range,  mean  (M),  and 

standard  deviation  (SD)  of  the  number  of  tasks  selected,  the  mean  Fisher  Z,  r ,  and  the  standard  deviation  of  the 
Fisher  Zs  (SDZ).  To  get  an  acceptable  measure  of  the  reliability  and  validity  of  the  time  spent  responses,  a  minimum 
of  seven  tasks  was  required  to  be  selected  by  each  subject;  consequently,  the  data  for  eight  subjects  was  excluded 
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from  analysis  because  they  responded  to  six  or  fever  tasks.  The  corr  ations  oi  the  absolute  time  estimates  ranged 

from  -.28  to  1.00.  There  were  no  significant  differences  (p  >  .05)  among  the  mean  Fisher  Zs  across  scales. 


Table  1.  Absolute  Time  Spent  R  diablity 


Treatment 

N 

M/SD 

Range 

Z/SDz/r 

Three  Stage 

145 

85/58 

7-303 

.74/.52/.63 

Direct  Magnitude 

130 

91/76 

8-449 

.86/.56/.70 

Indirect  Magnitude 

137 

85/62 

'7-366 

.76/.63A64 

End  Anchored 

152 

85/71 

7-334 

.79/.56/.66 

Average 

141 

87/67 

7-449 

.78/.57/.65 

In  addition  to  the  reliability  measure  shown  in  Table  1  (hereafter  referred  to  as  ZA  to  denote  that  it  was  computed 
using  information  from  all  tasks  selected  by  each  respondent),  five  other  measures  of  criterion  reliability  were 
computed:  Zw  (Fisher  Z  for  the  weekly  tasks),  ZE  (Fisher  Z  for  the  essential  tasks),  APEa  (average  proportional 
error  for  all  tasks  selected  by  the  respondent),  APEW  (average  proportional  error  for  the  weekly  tasks),  and  APEe 
(average  proportional  error  for  the  essential  tasks).  The  average  proportional  error  for  subject)  for  any  set  of  i  tasks 
is  defined  as: 

nj 

APE #=Y. APE  IN , 

1  u  1 

f=l 

where  Nj  is  the  number  of  tasks  responded  to  by  case  j  and  APEy  is  the  absolute  value  of  the  difference  in  absolute 
time  estimates  at  time  1  and  time  2  for  task  i  divided  by  the  larger  of  the  estimates. 

Using  Zw,  APEw,  Ze»  and  APEe  as  criteria,  regressions  were  confuted  to  determine  if  the  reliability  of  absolute  time 
estimates  for  die  Weekly  and  essential  task  subsets  varied  according  to  type  of  scale  assigned  to  the  subject.  As 
expected,  no  significant  differences  were  observed  (p  >  .05).  In  addition,  similar  results  were  obtained  with  APEa  as 
die  criterion  as  were  obtained  for  ZA.  Finally,  regressions  were  computed  to  see  if  the  reliability  of  absolute  time 
estimates  for  the  subset  of  experimentally  scaled  tasks  (up  to  36  per  rater)  varied  according  to  experimental  scale 
assigned.  Again,  no  significant  differences  were  observed  (p  >  .05).  From  these  results,  we  can  infer  that  the 
validity  results  presented  later  will  not  be  affected  by  differential  reliability  being  present  among  die  sets  of  subjects 
assigned  to  each  experimental  scale. 


EXPERIMENTAL  SC  ATE  RELIABILITY 

For  the  Three-Stage  Scale,  two  measures  of  reliability  (Fisher  Z  and  APE)  were  computed  for  each  of  the  three 
stages  and  three  data  types  (raw,  interval  scaled  hours,  and  ratio  scaled  hours).  There  were  no  significant  differences 
(p  >  .05)  among  the  reliabilities  at  each  stage.  The  mean  Fisher  Z  varied  from  .59  (Stage  2,  raw)  to  .70  (Stage  1, 
ratio)  and  the  mean  APE  ranged  from  .24  (Stages  1  and  2,  raw)  to  .66  (Stage  2,  ratio).  Transformation  of  the  mean 
Fisher  Z  yielded  a  range  of  correlations  from  .53  to  .60.  The  SDs  of  the  Fisher  Zs  were  large,  ranging  from  .29 
(Stage  1,  raw)  to  .73  (Stages  1  and  3,  ratio),  and,  similarly,  die  SDs  of  the  APEs  were  also  large,  ranging  from  .12 
(Stages  1  and  2,  raw)  to  .21  (Stage  1,  interval).  In  addition,  chi-square  results  showed  that  respondents  perceived 
interval-scaled  data  to  be  more  accurate  than  ratio-scaled  data  (p  <  .05),  although  from  a  practical  standpoint  the 
preference  was  slight  (57  percent  at  each  administration). 

For  the  Direct  Magnitude  Scale,  the  reliablity  results  were  very  similar  for  both  the  raw  data  (mean  Fisher  Z  =  .53, 
APE  =  .50)  and  the  raw  data  converted  to  hours  (mean  Fisher  Z  =  .55,  APE  =  .51). 
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For  the  End-Anchored  Graphical  Scale,  the  reliablity  results  based  bn  the  Fisher  Z  statistic  were  also  very  similar  ior 
both  the  raw  data  (.77)  and  the  raw  data  converted  to  hours  (.80);  however,  the  average  proportional  error  w;>; 
substantially  less  for  the  raw  data  (.36)  than  for  the  converted  data  (.56). 

For  the  Indirect  Magnitude  Scale,  there  were  no  significant  differences  (p  >  .05)  between  reliabilities  at  eae  ■ 
stage.  The  mean  Fisher  Z  varied  from  .87  (Stage  2)  to  .90  (Stage  1)  and  the  mean  APE  ranged  from  .32  (Stager 
and  2,  raw)  to  .44  (Stages  1  and  2,  hours).  Transformation  of  the  mean  Fisher  Z  yielded  a  range  of  correlations  from 
.70  to  .72.  The  SDs  of  the  Fisher  Zs  ranged  from  .36  to  .39  and  the  SDs  of  the  APEs  ranged  from  .09  to  .13. 
Therefore,  across  all  scale/stage/data  fype  combinations,  Stage  1  of  the  Three  Stage  Scale  gave  the  best  results  with 
APE  as  the  measure  of  reliability.  On  the  other  hand,  the  Indirect  Magnitude  Scale  gave  the  best  results  with 
correlation  as  the  measure  of  reliability;  however,  the  reliability  associated  with  the  End-Anchored  Scale  was  not 
significantly  lower  (p  >  .05)  than  the  reliability  associated  with  the  Indirect  Magnitude  Scale. 


SCALE  VALIDITY  ANALYSES 
VALIDATION  OF  CRITERION  SCALE 

Before  discussing  the  validity  of  the  experimental  scales,  the  validity  of  the  absolute  time  spent  estimation  procedure 
(criterion  scale)  must  be  established.  First,  it  can  be  argued  that  the  absolute  time  spent  scale  possesses  superior 
content  and  construct  validity  relative  to  the  four  experimental  scales.  While  all  the  experimental  scales  are  focused 
on  total  time  spent  on  a  task,  the  absolute  time  spent  scale  decomposes  total  time  into  its  two  basic  components: 
“time  to  perform  a  task  once”  and  “frequency  of  performance.”  “Frequency”  has  been  shown  time  and  again  in  the 
literature  to  be  a  measure  which  corresponds  to  an  innate  counter  mechanism  we  all  possess,  a  perceptive  ability  that 
is  consistently  more  accurate  than  that  which  governs  our  perception  of  time.  In  addition,  “time  to  perform  a  task 
once”  has  the  advantage  of  representing  an  average  or  median  value  rather  than  a  total.  For  example,  if  you  were  to 
look  at  a  long  column  of  numbers,  you  could  more  quickly  approximate  a  reasonably  accurate  median  value  than  a 
sum  for  that  set  of  numbers.  Second  the  absolute  time  scale  permits  the  rater  to  respond  unambiguously  without 
need  of  translation  or  transformation  of  arbitrary  scale  values,  such  as  those  found  in  a  l-to-9-point  relative  time 
spent  scale.  Third,  a  task  need  not  be  compared  with  another  task  or  “all  other  tasks”  in  order  to  make  a  frequency 
or  time  estimate.  In  other  words,  the  absolute  scale  does  not  require  task  comparisons.  Fourth,  since  the  absolute 
time  scale  is  not  relativistic,  neither  is  it  ipsative,  as  is  the  9-point  relative  time  spent  scale,  which  requires  that  each 
task  be  rated  relative  to  “all  other  tasks  I  perform.”  Fifth,  the  absolute  time  scale  sets  no  arbitrary  limit  on  the 
magnitude  of  responses;  whereas,  the  9-point  relative  time  spent  scale  severely  limits  the  magnitude  of  a  response, 
both  in  terms  of  die  number  of  scale  points  available  and  the  maximum  weight  a  scale  point  can  have  as  the  number 
of  tasks  rated  increases.  In  the  final  analysis,  if  we  are  willing  to  accept  that  the  most  valid  scaling  procedure  is  the 
one  that  allows  the  rater  to  say  exactly  what  he/she  wants  to  say  with  a  minimum  of  ambiguity,  then  the  absolute  time 
spent  scaling  procedure  would  certainly  merit  the  role  of  criterion  as  compared  to  the  four  experimental  scales. 

Two  procedures  were  applied  to  provide  empirical  validation  of  the  criterion  scaling  procedure.  In  the  first 
procedure,  every  subject  was  presented  pairs  of  time  spent  estimates  in  terms  of  hours  per  month  for  up  to  10  tasks 
rated  by  that  subject  on  both  the  absolute  time  spent  scale  and  the  assigned  experimental  scale.  The  tasks  selected 
were  those  with  the  greatest  discrepancy  between  the  two  estimates.  The  source  of  the  estimates  was  not  identified 
and  die  order  of  presentation  was  randomized.  The  subject  was  asked  to  select  the  estimate  that  he/she  felt  was  the 
more  accurate  of  the  two.  Preliminary  analysis  indicated  that  there  was  no  bias  toward  selecting  the  first  or  the 
second  estimate.  If  the  criterion  scale  was  truly  more  valid  than  any  of  the  experimental  scales,  a  significantiy  higher 
proportion  of  absolute  time  estimates  should  have  been  selected  as  more  accurate  than  estimates  derived  from  any  of 
the  four  experimental  scales.  As  expected,  the  absolute  time  spent  estimates  were  selected  more  often  than  the 
experimental  scale  estimates,  regardless  of  experimental  scale,  at  both  time  1  and  time  2.  The  percentage  of  times 
that  the  absolute  time  spent  estimate  was  chosen  ranged  from  52%  (Direct  Magnitude,  time  1)  to  56%  (End 
Anchored  and  Three  Stage,  time  1).  A  value  of  p  <  .001  was  associated  with  most  of  the  computed  chi-square 
values. 

In  the  second  procedure,  each  subject  was  presented  the  list  of  experimentally  scaled  tasks.  The  subject  was  asked  to 
check  those  tasks  he/she  had  performed  within  the  last  five  working  days.  The  results  of  this  exercise  were  as 
follows:  (1)  At  time  1,  70%  of  the  tasks  that  had  been  checked  as  tasks  performed  at  least  once  a  week  in  the 
absolute  time  spent  procedure  were  checked  in  this  exercise  as  having  been  performed  within  the  last  five  working 
days.  At  time  2,  the  percentage  was  68%;  a  chi-square  test  across  all  cases  (weekly  vs.  nonweekly  and  recently 
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performed  vs.  not  recently  performed)  yielded  *  chi-squart  value  with  an  associated  probability  of  p  <  .001.  (2)  The 
correlation  between  tasks  identified  as  performed  or  not  performed  within  the  last  five  working  days  at  time  I 
correlated  .50  with  time  2  identifications.  (3)  A  t-test  was  computed  between  the  mean  number  of  times  per  year 
tasks  were  performed  if  they  were  identified  as  recently  performed  vs.  the  mean  number  of  times  per  year  for  tasks 
not  recently  performed.  Both  time  1  and  time  2  data  yielded  t- values  with  p  <  .001.  The  results  of  this  procedure 
present  evidence  confirming  the  validity  of  the  frequency  measure  used  as  a  component  of  the  absolute  time  spent 
spale  and  thereby  provide  a  partial  empirical  validation  of  the  criterion  scaling  procedure. 

COMPARATIVE  VALIDITY  OF  EXPERIMENTAL  SCALES 

The  first  step  in  determining  the  comparative  validity  of  the  four  experimental  scales  was  to  identify  the  most  valid 
form  of  each  scale  to  use  in  the  comparison.  For  all  scales,  the  “best”  functional  form  relating  it  to  the  criterion  scale 
was  sought.  The  forms  considered  were,  Y  =  a  +  bX,  Y  =  a  +  b  lnX,  Y  -  a  +  b]X  +  b2X2 ,  InY  =  a  +  bX,  InY  =  a  +  b 
lnX,  and  InY  =  a  +  b[X  +  b2X2  .  For  the  Three-Stage  Relative  Time  Spent  Scale,  additional  alternatives  had  to  be 
considered,  such  as  which  of  the  three  stages  was  most  valid  and  did  ratio-interval  or  equal-interval  rescaling  of  the 
ratings  into  estimated  hours  per  year  provide  a  better  fit  of  the  data  to  the  criterion  values.  For  the  Indirect 
Magnitude  Estimation  Scale,  a  determination  as  to  which  of  the  two  stages  was  most  valid  had  to  be  made. 

As  for  the  reliability  analyses,  the  validity  analyses  were  conducted  at  the  case  level  with  the  criterion  value  for  each 
case  being  the  Fisher  Z  value  corresponding  to  the  correlation  of  that  case’s  experimental  scale  task  ratings  with  the 
corresponding  absolute  time  spent  scale  estimates.  Fisher  Z  values  were  averaged  for  all  cases  in  each  scale  type  and 
reconverted  to  an  average  correlation.  Time  1  and  time  2  data  were  considered  separately  and  combined.  The 
results  are  shown  in  Table  2. 

Table  2.  Most  Valid  Form  of  Each  Experimental  Scale  at  Time  1  and  Time  2 
(For  all  scales,  best  functional  form  =  parabolic) 


Mean  correlation 


3-STAGE  SCALE  IN  =  145) 

STAGE  3  (RATIO  INTERVAL) 

TIME  1 
.59 

TIME  2 
.62 

TIME  1/TIME  2 
.60 

DIRECT  MAGNITUDE  (N  =  130) 

.58 

.65 

.62 

INDIRECT  MAGNITUDE  IN  =  137) 
STAGE  1 

.66 

.71 

.69 

END-ANCHORED  (N  -  152) 

.85 

.88 

.87 

Table  2  shows  that  for  all  scales,  the  parabolic  equation  (Y  =  a  +  bjX  +  b2X2  )  provided  the  best  fit  of  the 
experimental  scale  data  to  the  criterion  data;  that  the  best  fit  for  the  Three-Stage  Scale  was  the  ratio  transformation  of 
data  at  Stage  3;  and  that  the  Stage  1  data  of  the  Indirect  Magnitude  Scale  provided  a  better  fit  than  the  Stage  2  data. 
It  can  also  be  ascertained  from  Table  2  that  die  End-Anchored  Graphical  scale  provided  a  significantly  better  fit  of 
the  criterion  than  any  of  the  other  experimental  scales.  A  t-value  of  4.03  (p  <  .001)  was  computed  for  die  difference 
between  the  combined  time  1/time  2  correlation  of  .87  for  the  End-Anchored  Graphical  Scale  vs.  .69  for  the  Indirect 
Magnitude  Scale  (second  best  scale).  The  average  correlation  for  the  Indirect  Magnitude  Scale,  however,  was  not 
found  to  be  significandy  different  from  the  average  correlations  for  the  two  remaining  scales. 

The  second  step  in  determining  the  comparative  validity  of  the  four  experimental  scales  was  to  confirm  the  superior 
validity  of  the  End-Anchored  Graphical  Scale  by  checking  to  see  whether  sampling  biases  regarding  the  types  of  jobs 
and  job  incumbents  represented  in  the  various  experimental  scale  groups  may  have  accounted  for  the  differences  in 
validity  among  the  scales. 
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Although  it  was  found  that  some  variables,  such  as  AFQT  score  and  average  time  it  takes  the  rater  to  complete  the 
absolute  time  spent  portion  of  the  survey  were  important  contributors  to  validity,  in  general,  there  was  little  criterion 
variance  accounted  for  by  these  ‘‘nuisance’*  variables,  and,  after  allowing  for  the  variance  accounted  for  by  the  job 
classification  and  job  incumbent  variable!?,  the  End-Anchored  ‘Graphical  Scale  was  still  found  to  be  significantly 
superior  to  the  other  scales.  This  analysis  did,  however,  find  the  Indirect  i.  magnitude  Scale  to  be  significantly  more 
V  valid  than  the  remaining  scales  as  a  result  of  holding  the  job  classification  and  job  incumbent  variables  constant. 

The  major  conclusions  to  be  derived  from  these  scale  validity  analyses  are:  (1)  The  absolute  time  spent  criterion 
scale  appears  to  be  an  acceptable  criterion  relative  to  the  four  experimental  scales.  (2)  The  End-Anchored 
Graphical  Scale  appears  to  be  the  most  valid  of  the  experimental  scales.  (3)  The  Direct  Magnitude  Scale  and  the 
nine-point  relative  time  spent  scale  (Stage  1  of  the  Three-Stage  Scale)  appear  to  be  the  least  valid  of  all  the  scale 
alternatives. 

An  appealing  compromise  scaling  procedure  that  would  use  the  End- Anchored  Graphical  Scale  as  the  primary 
measurement  device  would  be  to  have  the  rater  employ  the  absolute  time  spent  scaling  procedure  on  a  small  subset 
of  end-anchored-rated  tasks  covering  the  full  range  of  time  spent,  thus  enabling  the  conversion  of  the  End-Anchored 
Graphical  Scale  ratings  to  estimates  of  absolute  time  by  application  of  the  parabolic  functional  relationship.  Future 
R&D  is  planned  to  validate  and  refine  this  procedure  as  an  operational  spin-off  of  the  computer-administered  survey 
(CAS)  system. 


SUMMARY 

This  effort  has  produced  a  user-friendly,  PC-based  procedure  for  administering  occupational  surveys  to  job 
incumbents.  By  replacing  the  current  hard-copy  administration  procedure,  the  time  and  cost  (printing,  mailing  out, 
return  mail,  data  entry)  required  to  conduct  occupational  analyses  will  be  greatly  reduced,  and  a  more  effective  use 
of  resources  such  as  manpower,  time,  and  equipment  will  be  possible.  Quicker,  more  efficient  turnaround  time  will 
meet  Air  Force  managers1  requirements  for  fast,  accurate  information  on  which  to  base  critical  manpower,  personnel, 
and  training  decisions.  In  addition,  the  accuracy  of  individual  and  group  job  descriptions  may  be  increased  by 
adapting  the  data  gathering  process  to  the  "intelligent’'  interactive,  survey  tailoring  capabilities  of  a  PC-based 
procedure  and  by  the  use  of  the  most  valid  and  reliable  PC-based  scaling  procedures  identified  by  the  analyses 
reported  in  this  paper.  A  questionnaire  to  assess  the  attitudes  of  raters  toward  the  computer-administered  survey 
(CAS)  procedure  yielded  positive  ratings  concerning  usage  of  the  software. 
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